A new study led by Apple researchers has poured cold water on the current large reasoning models (LRMs) that are highly anticipated. The research found that when solving complex tasks, reasoning models designed to simulate thinking processes, such as Claude3.7Thinking and Deepseek-R1, not only failed to show advantages but also exhibited serious problems like insufficient reasoning and performance collapse. This study tested four classic logical puzzles: the Tower of Hanoi, Checkers, River Crossing, and Block World. These puzzles allow precise control over the tasks.